Skip to main content

Q&A for Legal Texts

·477 words·3 mins
LLM Space Vector Model Information Retrival Deep Learning Data Scientist Machine Learning Web-Scraping Vector Database OpenSearch Python Haystack Numpy Docker Web-UI Vue.js JavaScript Flask AWS Cloud Git

As part of this project, an AI-based question-and-answer system was developed that is specifically tailored to the Residence Act. The system enables users to ask questions about legal texts and receive suitable answers in the form of relevant text excerpts from the legal texts and associated documents. The project is therefore part of Natural Language Processing (NLP), as natural language is processed.

Architecture of the system #

The system is based on the open source library Haystack, which is used to implement question-and-answer systems in combination with state-of-the-art language models. The core of the system consists of several steps:

  1. Embedding the legal texts: The relevant legal texts as well as information from websites are stored in a vector database. A vector space model is used for this purpose, in which each document is represented as a vector in a high-dimensional space.

2 Embedding of questions: The questions asked by users are also represented as vectors in order to perform a semantic comparison between question and documents. This is superior to the classic keyword-based search, as this method also recognizes synonyms.

3 Searching the vector database: The embeddings of the questions are used to find similar document embeddings in the vector database. This identifies semantically matching text excerpts for the questions asked.

4 Display of the results: The text snippets found are displayed in a user-friendly web UI, together with direct references to the corresponding legal documents.

graph TB A[Legal texts and web pages] -->|Embedding| B[Vector database] B -->|Relevant text excerpts| C[Web UI] C -->|Question| B

Implementation #

Backend #

The backend of the system was developed using Flask, a lightweight Python framework that enables fast and efficient web application development. Haystack** and LLM models were used to process the questions and search the vector database. The focus here was on the BERT language model, which was obtained via the Huggingface platform. BERT is used to create embeddings for both the legal texts and the users’ questions.

Frontend #

The web UI was developed with Vue.js and the UI framework Vuetify. The user interface displays the legal texts found and provides the corresponding references to the documents. The UI enables intuitive and fast interaction so that users can access the information they are looking for directly.

Conclusion #

With this AI-based system, an efficient question-and-answer tool for legal texts could be developed, based on Deep Learning and Natural Language Processing, using Large Language Models. The system provides a user-friendly way to search complex legal texts and find relevant information quickly and accurately.

Activities #

  • Creation of text datasets about the German Residence Act; crawling of several relevant websites, as well as the Residence Act itself (Python scripts)
  • Application of the Haystack-Framework
  • Integration of Large Language Models (LLMs)
  • Evaluation of various AI models
  • Provision of a web front-end to demonstrate the question-answer system (Vue.js, Python Flask)
  • Deployment of the application in the AWS Cloud